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Abstract 

Background: The cellular response to DNA damage is immediate and highly coordinated in order to maintain 
genome integrity and proper cell division. During the DNA damage response (DDR), the sensor kinases Tell and 
Mecl in Saccharomyces cerevisiae and ATM and ATR in human, phosphorylate multiple mediators which activate 
effector proteins to initiate cell cycle checkpoints and DNA repair. A subset of kinase substrates are recognized by 
the S/T-Q cluster domain (SCD), which contains motifs of serine (S) or threonine (T) followed by a glutamine (Q). 
However, the full repertoire of proteins and pathways controlled by Tell and Mecl is unknown. 

Results: To identify all putative SCD-containing proteins, we analyzed the distribution of S/T-Q motifs within 
verified Tell /Mecl targets and arrived at a unifying SCD definition of at least 3 S/T-Q within a stretch of 50 residues. 
This new SCD definition was used in a custom bioinformatics pipeline to generate a census of SCD-containing 
proteins in both yeast and human. In yeast, 436 proteins were identified, a significantly larger number of hits than 
were expected by chance. These SCD-containing proteins did not distribute equally across GO-ontology terms, but 
were significantly enriched for those involved in processes related to the DDR. We also found a significant 
enrichment of proteins involved in telophase and cytokinesis, protein transport and endocytosis suggesting 
possible novel Tell/Mecl targets in these pathways. In the human proteome, a wide range of similar proteins were 
identified, including homologs of some SCD-containing proteins found in yeast. This list also included high 
concentrations of proteins in the Mediator, spindle pole body/centrosome and actin cytoskeleton complexes. 

Conclusions: Using a bioinformatic approach, we have generated a census of SCD-containing proteins that are 
involved not only in known DDR pathways but several other pathways under Tell/Mecl control suggesting new 
putative targets for these kinases. 
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Background the setting of irreparable damage, apoptosis or senes- 
The conserved DNA damage response (DDR) pathway cence [1-5]. In S. cerevisiae under genotoxic stress, Tell 
proceeds as a highly coordinated cascade of cellular and Mecl activate the DDR by phosphorylating key 
events under the control of the phosphatidyl inositol 3' mediators Chkl, Rad53, Mrcl and Rad9, and others, 
kinase-related kinases (PIKKs), most notably Tell and resulting in the halt of DNA replication and cell cycle 
Mecl in Saccharomyces cerevisiae and their homologs progression at Gl and S phases or at G2/M transition 
ATM and ATR, respectively, in human [1,2]. During the [5]. These events are coordinated with global changes in 
DDR, sensor proteins detect DNA damage and then re- transcriptional patterns and DNA repair activation to 
cruit and activate multiple proteins that mediate and ensure that the cell cycle progresses and DNA replica- 
transduce signals to elicit, among others, transcriptional tion resumes once the damage is repaired. In addition, 
programs, cell cycle arrest, DNA repair activity and, in the discoveries of Hopl as a downstream effector of 

Tell/Mecl signaling and defective telomerase recruit- 
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Recently, a series of large-scale studies suggest that the 
number of Tell /Me cl targets is much higher than ini- 
tially estimated. A high throughput analysis in yeast trea- 
ted with DNA damaging reagents identified 355 proteins 
phosphorylated at S/T-Q sites [7], A similar approach in 
human cell lines treated with UV radiation, led to the 
identification of 570 phosphosites [8]. An additional 
search for peptides phosphorylated at ATM/ATR con- 
sensus sites in response to ionizing radiation yielded 
more than 700 putative protein targets, of which many 
lacked functional characterization of their S/T-Q phos- 
phorylation sites [9]. While many of these phospho- 
targets function in DDR pathways, others belong to 
pathways that were not known to be under ATM/ATR 
control. Therefore, alternative methods to obtain a full 
census of Tell/Mecl substrates might delineate add- 
itional functions of these kinases beyond the DDR. 

Tell/Mecl kinases phosphorylate well-known DDR 
proteins at S/T-Q consensus sites. In some targets, these 
sites appear to be concentrated within a relatively short 
stretch of sequence previously defined as at least 3 S/T- 
Q within 100 amino acids, the so-called S/T-Q cluster 
domain (SCD) [4]. To date, an SCD has been confirmed 
in just 11 Tell/Mecl targets with many of these having 
more than one SCD (Figure 1 and Additional file 1: 
Table SI). Not all S/T-Q sites within a given SCD are 
phosphorylated, although all 11 of these SCD proteins 
possess at least 1 Tell/Mecl-phosphorylated S/T-Q 
within an SCD. 

Although the functions of SCD domains are not com- 
pletely understood, they often mediate protein-protein 
interactions during signaling cascades [4]. For instance, 
a single phosphorylation of the Rad53 SCD promotes 
dimerization whereas a double phosphorylation triggers 
Rad53 binding to the FHA domain of Dunl [10]. Simi- 
larly, sustaining the DNA damage signal requires 
oligomerization of Rad9 proteins at DNA breaks 
through the interaction of its BRCT domain and phos- 
phorylated SCD [11]. These examples suggest that SCDs 
are biologically relevant domains with important roles 
during the DDR. 

Based on the original SCD definition (at least 3 S/T-Q 
motifs within 100 amino acids), more than 25% of the 
proteins in the S. cerevisiae proteome contain an SCD. 
To better discriminate against false positives, we used a 
more stringent definition of the SCD to identify poten- 
tial Tell/Mecl targets. The final set of targets contained 
436 proteins including the 11 known SCD-containing 
Tell/Mecl targets. This SCD census was enriched for 
proteins in DDR-related pathways such as cell cycle pro- 
gression and checkpoints, DNA repair and transcriptional 
regulation. In addition, we observed an over-representation 
of proteins with roles in several pathways previously only 
weakly linked to Tell/Mecl. Similar results were obtained 



when the new SCD definition was applied to generate a 
human SCD census. 

Results 

Obtaining a census of SCD proteins 

The SCD in S, cerevisiae was previously defined as a re- 
gion with at least 3 S/T-Q within 100 residues. Examin- 
ation of the 11 known SCD proteins revealed the SCD 
could be defined as having 3 S/T-Q within just 42 amino 
acids (Figure lA, and Additional file 1: Table SI). To re- 
fine and ease the stringency of our census, we used 
ScanProsite to search the UniProt database for S, cerevi- 
siae proteins containing at least 3 S/T-Q within a stretch 
of 50 residues or less. We found a total of 436 proteins, 
each having at least one SCD region (Additional file 2: 
Table S2). This number was significantly higher than the 
147 SCD proteins expected to be present in the yeast 
proteome by chance {p < 10"^; see Methods) suggesting 
SCDs are indeed biologically relevant units rather than 
stochastic events. Since the probability of seeing an S/TQ 
by chance alone increases as the protein length increases, 
we determined the distribution of the SCD-containing 
protein lengths by performing a goodness of fit test using 
Person s chi-square test and we found that the distribution 
of protein lengths in our census is not statistically different 
from a log- normal distribution {p = 0.285) (Figure IB). 

Empirical support for our SCD definition could be 
found in several ways. First, 100 of the identified proteins 
had evidence of phosphorylation at S/T-Q sites in mass 
spectrometry phosphoproteomic studies, with 60 of those 
occurring within an SCD (Figure IC and Additional file 1: 
Tables SI, Additional file 3: Table S3 and Additional file 4: 
Table S4) [9]. Second, of the 28 Mecl/Tell-dependent 
and Rad53-independent phosphoproteins that were 
induced after exposure of wildtype and radSSA yeast to 
methyl methanosulfonate, 7 were present in our list 
(expected overlap of 1.787 proteins, p = 2.575e-04) [12]. 
Third, our list also contained 13 of the 58 proteins that 
were found in as Tell/Mecl targets in a quantitative mass 
spectrometry analysis (expected overlap of 3.702 proteins, 
p = 1.139e-05) [13]. Fourth, additional similarities with 
other reports were uncovered in the amino acids flanking 
those SCDs that contained phosphorylated S/T-Q (pS/T- 
Q) motifs (Figure ID). Serine residues were frequently 
found upstream of pS/T-Q, whereas glutamic acid resi- 
dues were enriched at the +2 position. These features cor- 
responded to sites of DNA damage- induced pS/T-Q sites 
in human proteins [9]. Therefore, our SCD definition of 
3 S/T-Q within 50 amino acids identified proteins with 
empirical data supporting DDR-related functions. 

Analysis of the GO-Slim ontology terms associated 
with the 436 yeast SCD proteins showed a non-random 
distribution and a concentration in specific functions, 
processes and components (Additional file 5: Figure SI). 
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(A) Tell /Med target #S/T-Q/length of SCD (position) Reported phosphorylations in SCD 
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Figure 1 (See legend on next page.) 
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(See figure on previous page.) 

Figure 1 SCD-containing proteins in S. cerevisiae. (A) List of eleven reported SCD domains [4]. For eacli protein, tlie number of S^-Q sites 
witliin tine SCD(s) is indicated witli tine amino acid positions in bracl<ets. Grapliical representations of tine approximate location of the SCD along 
the length of the protein are shown on the right with reported phosphorylation sites from UniProt. For references, see Additional file 1: Table SI 
(B) Distribution of the lengths of SCD-containing proteins as compared to a log-normal distribution (p-value = 0.2855). The length of protein 
(x-axis) is plotted against the frequency of occurrence in either the census (black) or the yeast proteome (red). (C) A pie chart showing the 
proportions of SCD proteins in the census that are known or novel, apportioned by whether pS^-Q sites are characterized within the SCD. 
(D) Alignment of amino acids flanking known phosphorylated S^-Q motifs in yeast, with S^ being position 0. Each unique amino acid is given a 
color, with the size of letter indicating the proportion of motifs having that amino acid at the position indicated. Evidence of phosphorylation 
was obtained from UniProt for both reported SCDs of Tell/Mecl targets (left) and for all known and putative SCD domains in the census (right). 



As anticipated, we found significant enrichment in 
ontology terms that are usually associated with DDR 
proteins. Like known Tell/Mecl targets, the identified 
SCD-containing proteins tended to be nuclear, respon- 
sive to stress signals, involved in phosphorylation and 
signal transduction, and have roles in DDR-related path- 
ways such as cell cycle progression and transcriptional 
regulation (Additional file 5: Figure SI). When the ana- 
lysis was extended to all GO terms, those related to 
DDR pathways remained significantly over-represented 
(Figure 2), validating our approach. As expected for a list 
enriched for DDR genes, we found a significant enrich- 
ment in proteins involved in cell cycle progression, tran- 
scription, DNA replication and DNA repair (Figure 2). 

In addition, our yeast SCD census contained a signifi- 
cant enrichment of proteins associated with protein 
localization and endocytosis, suggesting a broader and 
more direct role for Tell/Mecl in these pathways than 



previously thought (Figure 2 and Table 1) [14]. The yeast 
census was also enriched for proteins involved in cyto- 
kinesis and cell division, pathways where a role for Tell/ 
Mecl kinases was previously hinted (Figure 2 and 
Table 1). These included components of the actin cor- 
tical cytoskeleton, which control later stages of mitosis 
to ensure both proper nuclear migration to the cellular 
bud and cell division during cytokinesis (Figure 2 and 
Table 1) [15]. This may also explain the enrichment of 
SCD proteins from the cellular bud and cell cortex in 
the census (Figure 2 and Table 1). In addition, several 
proteins involved in the spindle assembly checkpoint 
(SAC) and spindle position and orientation checkpoint 
as well as other networks controlling mitotic exit were 
found to contain an SCD (Table 1 and Figure 3 A) [16]. 
Consistent with a putative role for Tell and Mecl 
kinases monitoring key transition steps in mitosis, we 
found a concentration of SCD proteins localized to the 
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Figure 2 Gene Ontology terms enriched in the S. cerevisiae SCD census. A bar graph showing the percentage of SCD proteins significantly 
associated with Gene Ontology , processes (left), components (center) and functions term (right) as compared to the percentages of all yeast 
proteins (red bars) having the same associations. The significant p-values (p < 0.05) are shown as a line graph, its axis on the right. 
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Table 1 Selection of putative SCD containing Tell /Med 
targets 

Kinetochore 

Bubl Spindle assembly protein. 

Cbf2 Essential kinetochore protein, component of the CBF3 
multisubunit complex. 

Cnni Kinetochore protein of unknown function. 

Madi Spindle assembly protein. 

SpclOS Subunit of a kinetochore-microtubule binding complex that 
bridges centromeric heterochromatin and kinetochore. 
Required for kinetochore binding of SAC proteins. 

Tid3 Component of the evolutionarily conserved kinetochore- 
associated NdcSO complex. 

Mitotic spindle 

Bub2 Mitotic exit network regulator. 

Nudi Component of the spindle pole body outer plaque, required 
for exit from mitosis. 

Spc72 Binds spindle pole bodies and links them to microtubules. 

Spc97 Interacts with Spcl lOp at the spindle pole body (SPB) inner 
plaque and with Spc72p at the SPB outer plaque. 

Spcl 10 Inner plaque spindle pole body (SPB) component. 

Stui Component of the mitotic spindle that binds to interpolar 
microtubules. 

Cytokinesis 

CdcB Component of the septin ring of the mother-bud neck that is 
required for cytokinesis. 

Cla4 Involved in septin ring assembly and cytokinesis. 

Ctsi Required for cell separation after mitosis. 

Dse4 Degrades cell wall from the daughter side causing daughter to 
separate from mother. 

Egt2 Required for proper cell separation after cytokinesis. 

Eimi Serine/threonine protein kinase that regulates cellular 
morphogenesis, septin behavior, and cytokinesis. 

Cell bud 

Ace2 Transcription factor that activates expression of early Gl- 
specific genes, localizes to daughter cell nuclei after 
cytokinesis and delays Gl progression in daughters. 

Buds Protein involved in bud-site selection and required for axial 

budding pattern; localizes with septins to bud neck in mitosis. 

Buds Protein involved in bud-site selection. 

Numi Protein required for nuclear migration, localizes to the mother 
cell cortex and the bud tip. 

She3 Part of the mRNA localization machinery that restricts 
accumulation of certain proteins to the bud. 

Cortical Actin cytoskeleton 

Akll Ser-Thr protein kinase involved in endocytosis and actin 
cytoskeleton organization. 

Arki Serine/threonine protein kinase involved in regulation of the 
cortical actin cytoskeleton. 

Bnil Formin, nucleates the formation of linear actin filaments, 

involved in cell processes such as budding and mitotic spindle 
orientation. 



Table 1 Selection of putative SCD containing Tell /Med 
targets (Continued) 

Las17 Actin assembly factor, activates the Arp2/3 protein complex 
that nucleates branched actin filaments. 

Ndll Regulates dynein targeting to microtubule plus ends. 

Sdal Required for actin cytoskeleton organization. 



kinetochore and spindle pole body, two key subcellular 
compartments central to the SAC and completion of mi- 
tosis (Table 1 and Figure 3A). 

Our yeast SCD census also uncovered proteins not 
known to be Tell/Mecl targets, but with characterized 
roles in pathways well-known to be regulated by Tell/ 
Mecl kinases. The pathways included DNA repair, DNA 
replication, gene expression, meiosis, and telomere homeo- 
stasis (Figure 2 and Figure 3C). For example, there was an 
over-representation of proteins influencing RNA polll- 
dependent transcription (Figure 2 and 3D), such as compo- 
nents of the pre-initiation complex and Mediator as well 
as members of the SAGA and COMPASS complexes 
(Figure 3D). Furthermore, several DDR transcription fac- 
tors such as Rfxl were also found to contain an SCD, rais- 
ing the possibility they may be under direct control of 
Tell/Mecl kinases [17,18]. Many of these SCD-containing 
proteins belong to groups of proteins influencing more 
than one known DDR-related pathway. For instance, most 
of the nucleases and helicases involved in DNA double 
strand break repair containing an SCD have also been 
associated with replication and telomere homeostasis 
(Figure 3C). Similarly, several SCD proteins are involved in 
sumoylation, ubiquitination, chromatin remodeling and the 
establishment of sister chromatid cohesion, which are activ- 
ities known to influence several DDR pathways such as 
DNA replication, transcription regulation, DNA repair, and 
cell cycle progression (Figure 3C). Another example of 
crosstalk among DDR related pathways by SCD proteins in 
our census is a subset of transcription factors that ensure 
proper transitions within phases of the cell cycle 
(Figure 3B). This suggests these potential novel Tell/Mecl 
targets may serve as a link between cell cycle progression 
and global transcription changes, two key components dur- 
ing DDR. 

Census of human SCD proteins overlaps with the yeast 
census 

For a given SCD-containing protein, the presence of an 
SCD in orthologues increases the probability that it is a 
biological entity rather than generated randomly. For 
this reason, we searched the human proteome for proteins 
containing this newly defined SCD. This census identified 
2,193 proteins, including 13 of the 17 proteins used to 
formulate the original SCD definition (Additional file 6: 
Table S5) [4], and 188 of the 700 proteins found to 
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Figure 3 SCD-containing proteins in the S. cerevisiae SCO census. (A) Schematic of how Tell/Mecl may be directing the G2, Spindle 
Assembly, Spindle Position checkpoints, Mitotic Exit Network, and regulation of Ace2p transcription factor and polarized morphogenesis (RAM) 
network. Members of these checkpoints with putative SCD domains are shown in red. (B) Schematic of cell cycle progression gene clusters that 
are regulated by transcription factors with putative SCDs. (C) Schematic of categories of SCD-containing proteins that are involved in DNA 
structure and integrity. (D) Schematic of RNApolll-related complexes whose members were identified as having SCD domains. 



contain pS/T-Q sites following DNA damage [4,9]. The 
extremely low probability of coincidence between our 
human SCD census and these experimental data {p ~ 0) 
indicates that human SCD proteins are significantly 
enriched for proteins known to be phosphorylated fol- 
lowing DNA damage [9]. Furthermore, the ontology 
terms over-represented in this list overlapped largely 
with those found experimentally in human DDR pro- 
teins and with our yeast SCD census. These included 
terms such as DNA repair, cell cycle progression, gene 
expression, DNA replication, and response to DNA 
damage {p <3.5 x 10'^) (Figure 4) [9]. 

As with the yeast SCD census, we found an unexpected 
abundance of proteins belonging to the microtubule organ- 
izing center and spindle as well as the actin cytoskeleton 
(Figure 4). Also similar to yeast, the human SCD census 
was enriched in Mediator and key DDR transcription fac- 
tors such as the human homologue of RFXl (Figures 4, 5B 



and Table 2). Network analysis of the human SCD census 
revealed connections between SCD proteins involved in cell 
signaling pathways including JNK, ERK, RAS, AKT, cal- 
modulin signaling, and NF-kB, a pathway in which ATM 
plays important roles (Figure 5) [19]. Remarkably, these 
networks included insulin signaling proteins which are 
heavily phosphorylated at S/T-Q sites after DNA damage, 
suggesting these phosphorylations may occur in the context 
of an SCD (Figure 5C) [9]. Finally, as found in the network 
analysis of the yeast SCD census, one of the human SCD 
networks included several components of the nuclear pore 
complex as well as proteins involved in nucleocytoplasmic 
transport like importin-beta and exportin 1 (Xpol) 
(Figure 5A). 

Discussion 

The SCD is neither a motif nor a true protein domain in 
that a consensus alignment cannot completely define the 
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Figure 4 Gene Ontology terms enriched in tlie SCD census for H. sapiens. A bar graph showing the percentage of SCD proteins significantly 
associated with Gene Ontology , processes (left), components (center) and functions term (right) as compared to the percentages of all human 
proteins (red bars) having the same associations. The significant p-values (p < 0.05) are shown as a line graph, its axis on the right. 



region and there is variable spacing between each S/T- 
Q. This has made its identification in proteins difficult, 
relying on loose definitions extending to include 25% of 
the yeast proteome. Using a more stringent SCD defin- 
ition of a sequence containing at least 3 S/T-Q in a 
stretch of 50 amino acids, we arrived at a refined census 
of 436 proteins in the yeast proteome, still a much larger 
number than expected at random. The validity of this 
approach is supported by the enrichment of proteins 
phosphorylated at S/T-Q sites in mass spectrometry 
studies and the presence of all well-characterized SCD- 
containing proteins phosphorylated by Tell/Mecl. In 
addition, ontology terms related to the DDR are signifi- 
cantly over-represented in this census. We propose that 
this newly defined SCD can be used to predict new roles 
for Tell/Mecl during the DDR and to identify novel pu- 
tative targets for these kinases. 

While the presence of an SCD in a protein may have 
arisen stochastically, the existence of several SCD pro- 
teins in the same pathway is much more unlikely. There- 
fore, the definition has a higher predictive value when 
assigning new processes regulated by Tell/Mecl. Simi- 
larly, for a given SCD -containing protein, the presence 
of an SCD in homologues in other organisms increases 
the probability that the SCD is a biological entity and 
not randomly generated. For this reason, we searched 
the human proteome for proteins matching this newly 
defined SCD to look for similarities and differences. 
Table 2 shows a list of interesting yeast proteins in our 



census whose human orthologue either contains pS/T-Q 
sites, possess an SCD in their sequence, or both. These 
genes are likely to be Tell/Mecl targets in yeast and, in 
fact, several of them were phosphorylated in S/T-Q sites 
in high throughput mass spectrometry approaches [9] . 

Mitosis 

As hinted by previous reports, the presence of SCDs in 
several SAC proteins such as Bubl, Madl, and Cdc20 
indicates Tell and/or Mecl may control cell cycle pro- 
gression at the metaphase-anaphase transition in 
addition to their well-known roles in the G2/M, Gl and 
S checkpoints [20,21]. Consistent with this, Bubl, Madl 
and Cdc20 have phosphorylated S/T-Q sites after DNA 
damage and exit from mitosis was recently shown to be 
regulated by Tell/Mecl in yeast and by ATM/ATR in 
humans [9,16,21,22]. The significant enrichment of 
SCD-containing proteins involved in later stages of mi- 
tosis and cell division, including these and other putative 
novel SCD targets in the SAC, the spindle orientation 
checkpoint and cytokinesis, seems to emphasize the no- 
tion that Tell/Mecl is very active during these 
processes. 

While the presence of SCD proteins in the kinetochore 
relates to its functional role as a reservoir for SAC pro- 
teins, the presence of SCD proteins in the spindle pole 
body suggests an unknown role for Tell/Mecl in moni- 
toring spindle formation and orientation during mitosis. 
Consistent with this, the human Tell homologue ATM 
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resides in the centrosome, which we found was signifi- 
cantly enriched with SCD proteins [23]. Interestingly, 
several members of the yeast spindle orientation check- 
point such as Bub2 are SCD proteins. Bub2 resides in 
the spindle pole body and activates the mitotic exit net- 
work once the spindle has been correctly positioned pro- 
viding a link between spindle orientation and mitotic 
progression into cytokinesis [24,25]. The presence of an 
SCD in Bub2 suggests that this surveillance mechanism 
may also be under Tell/Mecl control. 

In addition to microtubules, both the yeast and human 
proteomes have a significant concentration of SCD pro- 
teins in the actin cytoskeleton. In yeast, several of these 
localize to the cellular bud and cell cortex to direct nu- 
clear migration, spindle orientation, nuclear division and 
cell division during cytokinesis. For instance Bnil, an 
SCD protein in both yeast and human, is a formin pro- 
tein that organizes actin filaments and is involved in mi- 
totic spindle orientation [26]. Deletion of RAD53 or 
CHKl in yeast causes aberrant mitotic movements of 
the nucleus into the bud neck without triggering 



anaphase, suggesting the DDR machinery also controls 
nuclear migration in mitosis [22]. In addition, several 
yeast SCD proteins form the contractile ring during 
cytokinesis [27]. Examples of such proteins are Cla4, a 
protein involved in ring assembly, and Cdc3, a septin 
which is a component of the contractile ring and whose 
human homologue also contains an SCD and has pS/T- 
Q sites upon DNA damage. Having functional SCDs in 
these processes would strengthen the notion that cross- 
talk occurs among the actin cytoskeleton governing nu- 
clear migration, cytokinesis and the DDR. 

To complete mitosis in yeast, the mitotic exit network 
(MEN) must be inactivated and the daughter cell com- 
pletely separated from the mother cell. Two transcrip- 
tion factors, Amnl and Ace2, play key roles in these 
steps and contain sequences that meet our SCD defin- 
ition. Amnl acts by downregulating MEN, whereas Ace2 
is restricted to the daughter cell where it activates sev- 
eral chitinases and glucanases that sever remaining links 
between bud and mother cell [28]. Moreover, Mob2 is 
another SCD-containing protein belonging to the RAM 
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Table 2 Similarities between yeast and human SCD 
proteins 

Selection of yeast SCD genes with human ortholog SCD genes 
Yeast Human Function 

Dna2* (DNA2) Nuclease and helicase required for Okazaki 
fragment processing; involved in DNA repair. 

Fkh2 (F0XJ3) Transcription factor with a major role in the 
expression of G2/M phase genes. 

Gcn2* (EIF2AK4) Protein kinase, phosphorylates elF2 (Sui2p) in 
response to starvation; contributes to DNA 
damage checkpoint control. 

Pol5 (MYBBPIA) DNA Polymerase phi; not required for 

chromosomal DNA replication; required for the 
synthesis of rRNA. 

NuplOO/ (P0M121) Subunit of the nuclear pore complex (NPC); 
116 interacts with mRNA export factor Mex67p and 

with Kap95p. 

Rad54* (RAD54L) Involved in the recombinational repair of 
double-strand breaks. 

Rfxl (RFXl) Major transcriptional repressor of DNA-damage- 
regulated genes. 

Smc2 (SMC2) Subunit of the condensin complex. 

Taf9^^ aAF9B) Subunit of TFIID and SAGA complexes. 

Tif4631* (EIF4G2) Translation initiation factor elF4G, subunit of the 
mRNA cap-binding protein complex (elF4F). 

Selection of yeast SCD genes whose human ortholog contains 
phosphorylated S/T-Q sites 

Bubl (BUBl) Protein kinase that play crucial roles in the 
Spindle Assembly Checkpoint. 

Cdc20 (CDC20) Cell-cycle regulated activator of APC/C, which is 
required for metaphase/anaphase transition. 

Cdc4 (FBXW7) F-box protein that controls cell cycle function, 
sulfur metabolism, and methionine biosynthesis. 

Hrr25 (CSNKID) Protein kinase involved in regulating vesicular 
trafficking, DNA repair, and chromosome 
segregation. 

Madl (MADl) Coiled-coil protein involved in the spindle- 
assembly checkpoint. 

Mcml (SRF) Transcription factor involved in cell-type-specific 

transcription and pheromone response. 

Mcm6 (MCM6) Protein involved in DNA replication; component 
of the Mcm2-7 hexameric pre-replicative 
complex. 

Mihl* (CDC25A) Protein tyrosine phosphatase involved in cell 
cycle control; regulates the phosphorylation 
state of Cdc28p. 

Mlhl** (MLHl) Protein required for mismatch repair in mitosis 
and meiosis as well as crossing over during 
meiosis. 

IVIsh3 (MSH3) Mismatch repair protein. 

Pds5** (APRIN) Protein required for establishment and 

maintenance of sister chromatid condensation 
and cohesion. 

Sch9 (RPS6KA1) Kinase involved in transcriptional activation of 
osmostress-responsive genes; regulates Gl 
progression. 



Table 2 Similarities between yeast and human SCD 
proteins (Continued) 

Tid3^^ (NDC80) Part of the kinetochore-associated NdcSO 

complex involved in chromosome segregation. 

Ubri (UBRl) E3 ubiquitin ligase (N-recognin) that ubiquitinate 
substrates in the N-end rule pathway. 

Ycs4** (NCAPD2) Subunit of the condensin complex. 

Selection of yeast SCD genes whose human ortholog contains an 
SCD and is known to be phosphorylated in S/T-Q sites 

Akil* (AAKl) Protein kinase involved in endocytosis and actin 
cytoskeleton organization. 

Bnil (FMN2) Formin, nucleates the formation of linear actin 
filaments; involved in and mitotic spindle 
orientation. 

CdcB** (SEP6*) Component of the septin ring of the mother- 
bud neck that is required for cytokinesis. 

Ctf18^* (CHTF18) Required for sister chromatid cohesion; part of 
the DNA damage replication checkpoint. 

Exol (EXOl*) 5'-3' exonuclease and flap-endonuclease 

involved in recombination, DSB and mismatch 
repair. 

Pol2 (POLE) Catalytic subunit of DNA polymerase (II) epsilon. 

RadSO^ (RAD50*) Subunit of MRX complex involved in processing 
double-strand DNA breaks, and telomere 
maintenance. 

Rtsi (PPP2RA*) Regulatory subunit of protein phosphatase 2A 
(PP2A). 

Tomi** (HUWEl*) E3 ubiquitin ligase of the hect-domain class; has 
a role in mRNA export from the nucleus. 

* Phosphorylated in S/T-Q sites in mass spectrometry searches. 

Phosphorylated in S/T-Q sites within an SCD in mass spectrometry searches. 

(regulation of Ace2 activity and cellular morphogenesis) 
pathway, whose function is essential for daughter cell- 
specific transcription required for cell separation [29,30]. 
Thus, SCD proteins are enriched for roles revolving 
around the end of mitosis, from the mitotic networks 
that control entry into anaphase and telophase to the 
regulation and formation of the contractile ring during 
cytokinesis to pathways that control cytokinesis and 
telophase completion. 

We also identified SCDs in yeast proteins controlling 
other aspects of cell cycle progression, especially those 
regulating other cell cycle boundaries. Examples are: Mihl, 
which is involved in G2/M transition, and Whi3 and Whi4, 
which coordinate START entry with cell size [31,32]. Genes 
whose expression is tightly linked to cell cycle progression 
often contain specific promoter sequences that allow their 
concerted and timely expression. Several transcription fac- 
tors that recognize these sequences contain an SCD, sug- 
gesting that Tell/ Mecl may also control cell cycle 
progression by influencing the expression of cell cycle regu- 
lated genes. In addition, two of the major E3 ubiquitin lig- 
ase complexes controlling cell cycle progression, APC/C 
and SCF, have members in the yeast SCD census (Table 2) 
[33]. For example, Cdc20 is an SCD-containing protein 
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belonging to the APC/C complex, which regulates the 
metaphase-anaphase transition. Similarly, Cdc4 is an SCD 
protein forming part of the SCF ubiquitin ligase, a complex 
that regulates entry into S -phase. Moreover, the Cdc4 
human orthologue, Fbxw7, is phosphorylated at S/T-Q sites 
after DNA damage [9]. Cdc4 also contains a so-called F- 
box that is the substrate recognition component of SCF 
complexes. Related to this, 6 of the 21 known F-box pro- 
teins in yeast were found in our census (Cdc4, Ufol, Amnl, 
MetSO, Skp2 and Dia2) [34,35]. While several of these F- 
box proteins play cell cycle-related roles, others are 
involved in cell morphology and cell growth. Furthermore, 
Mecl is known to activate the SCF/UFOl complex to de- 
grade HO, an endonuclease involved in mating type switch- 
ing. The presence of proteins involved in protein 
ubiquitination in the yeast SCD census supports the fact 
that in human cells several E3 ligases such as Brcal, 
Mdm2, Rnf8 and Rnfl68 are well-known mediators and 
effectors of DDR [36-39]. 

DNA replication 

The yeast SCD census also contains several proteins per- 
forming critical roles in DNA replication, such as pre- 
replication complex members Mcm4 and Mcm6, helicase 
Dna2, licensing factor Cdtl and polymerases Pol2 and Pol3. 
This correlates with the observations that human MCM 
members and the human homologue Pol2 are known 
ATM/ATR targets [40,41]. In yeast, these pre-replication 
complex and replication fork proteins may be targets of the 
Mecl -dependent DNA replication checkpoint (DRC) trig- 
gered by replication fork stalling, which is mediated by 
founding SCD members Mrcl, Sgsl and Rad53 [42,43]. 
Interestingly, the binding of Mrcl to Pol2 is required to 
stabilize Pol2 at stalled replication forks [23,44,45]. More- 
over, the DRC is dependent on Ctfl8, an SCD protein 
whose human homologue contains an SCD and is phos- 
phorylated at S/T-Q sites following DNA damage. Along 
with SCD proteins Chll and Pds5, CtflS is required for 
chromatid cohesion, a process regulated by the DDR in 
human cells through the phosphorylation of SMCl cohe- 
sion subunit by ATR [23,44,46,47]. SMC proteins constitute 
a family of ATPases forming the condensin and cohesion 
complexes as well as the Smc5-Smc6 complex in yeast. In 
addition to cohesion, several other SCD proteins belong to 
these complexes. For instance, Smc2 and Ycs4 are two SCD 
proteins belonging to the condensin complex whereas 
Mms21, an E3 SUMO ligase, and Nse4, belong to the 
Smc5-Smc6 complex, which is involved in DNA repair, co- 
hesion and recovery of stalled replication forks [48,49]. 

DNA repair 

During the DDR, Tell and Mecl coordinate the halt of cell 
cycle progression with the activation of DNA repair 
mechanisms. Consistent with this, four of the known Tell/ 



Mecl targets with characterized SCDs are directly involved 
in DNA repair: Esc4, Slx4, Sgsl and Sae2 [43,50,51]. In 
human cells, ATM and ATR kinases directly target hom- 
ologous recombination factors Nbsl and Rad52 and mis- 
match repair factor Msh2 [52]. As anticipated, our yeast 
SCD census contained a significant enrichment of proteins 
associated with all types of DNA repair pathways [53]. 
Homologous recombination was the most over-represented 
DNA repair pathway with SCD proteins involved in every 
step, including processing and resection (the MRX com- 
plex, Sae2, Exol, Sgsl and Dna2), homologous pairing and 
strand exchange (Rad51, Rad54, Rdh54), DNA synthesis 
(Pol2 and Pol3), Holliday junction resolution (Slx4, Radl, 
Mms4) and dissolution of homologous recombination 
intermediates (Sgsl and Srs2) [50,54-58]. The MRX com- 
plex is a known sensor of DNA damage that recruits Tell/ 
Mecl to double strand breaks during the DDR. Our data 
indicate the MRX component Rad50 contains an SCD both 
in yeast and human, which is known to be phosphorylated 
at S/T-Q sites following DNA damage [59]. Furthermore, 
Xrs2 and the human orthologue NBSl are known targets 
of the Tell/Mecl and ATM/ATR kinases during the DDR 
[55]. Since the majority of known factors involved in end 
processing during double strand break repair contain SCDs, 
this process may be under tight control of Tell/Mecl, per- 
haps regulating the pathway of double strand breal< repair, 
homologous recombination versus nonhomologous end 
joining, undertaken, an outcome dependent upon the level 
of resection present at the double strand breaks. Many of 
these proteins also impact telomere homeostasis and, there- 
fore, the presence of SCDs in this particular group of pro- 
teins may reflect Tell/Mecl regulation of their telomeric 
functions or simply the degree of telomere end resection as 
recently proposed [60]. 

In addition to homologous recombination, proteins 
impacting other DNA repair pathways were identified in 
the yeast SCD census. For example, mismatch repair pro- 
teins Msh3 and Mlhl were identified as possible Tell/ 
Mecl targets, which correspond with the known phosphor- 
ylations of the MSH3 and MLHl human homologues at S/ 
T-Q sites after DNA damage [9]. Other DNA repair pro- 
teins found in our SCD census are Nejl, required during 
NHEJ, and Mmsl, an E3 ubiquitin ligase that acts with 
SCD-containing Tell/Mecl targets Esc4 and Slx4 to pro- 
mote replication and recovery from replication fork arrest 
on damaged DNA [61,62]. Furthermore, the abundance of 
chromatin modification proteins mentioned below may be 
related to the roles they play during DNA repair in addition 
to transcription regulation. Overall, the high enrichment of 
DNA repair proteins in our census, along with the con- 
cordance between the yeast and human data, suggests that 
Tell/Mecl may have a more significant role in directly 
phosphorylating proteins involved in DNA repair pathways 
during the DDR than currently recognized. 
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Transcription regulation 

Another profound effect of inflicting DNA damage is a 
global change in transcription, which affects 5% of the 
yeast genome [63]. Not surprisingly, we found gene ex- 
pression as one of the most over-represented ontology 
terms in our census, which corresponded to several 
transcription factors that regulate the expression of cell 
cycle, DNA repair and DNA replication genes. One of 
the major gene expression changes during the DDR 
involves upregulation of the RNR genes, which results in 
a 6-8 fold increase in dNTP levels in cells [17,64,65]. 
Rfxl, a transcription factor that binds and regulates 
RNR gene promoters, was found both in our yeast and 
human SCD censuses. While Dunl -dependent phos- 
phorylation of Rfxl during the DDR is well established, 
our data suggest a more direct role of Tell/Mecl in 
Rfxl regulation. 

Perhaps more surprisingly, we found a significantly 
greater number of proteins in the RNA PolII pre- 
initiation and Mediator complexes in both the yeast and 
human SCD censuses than expected. This suggests that, 
in addition to gene specific transcription factors, the 
basal transcription machinery may be part of the DDR. 
Protein subunits of other complexes known to influence 
gene expression were also found to contain SCDs. For 
instance, we found SCDs in components of the histone 
methylation COMPASS complex (Swd3), the SAGA 
complex (Spt3, TafS and Taf9), the histone acetyl- 
transferase SAS complex (Sas2), the NuA4 complex 
(Eaf3 and Swc4) and the SWI/SNF and RSC remodeling 
complexes (Swil, Rsc3 and Arp9). SCDs were also iden- 
tified in several yeast proteins involved in heterochroma- 
tin formation such as Sirl, Sir4, Rifl and Tbfl [66]. The 
abundance of chromatin modification proteins correlates 
with the way human TIP60 (histone acetyl-transferase) 
and NuA4 bind to Mdcl and participates in the DDR 
[38]. Additionally, transcription factors MATal and 
MATa2, the yeast mating type loci, contain an SCD and 
bind SCD-containing Mcml, further suggesting add- 
itional targets for Tell/Mecl during mating type 
switching. 

RNA metabolism 

Our yeast SCD census was also significantly enriched for 
proteins involved in a panoply of processes required for 
mRNA processing and protein synthesis such as mRNA 
capping (Cegl), mRNA cleavage and polyadenylation 
(Mpel, Ptll, Hrpl, Airl), splicing (Mudl, Mud2, Prpl6, 
Prp22, Prp4, Prp43, Syf2), translation initiation (Tif4631, 
Rrgl, Gcn2), translation regulation (Mrnl), translation 
termination (Ecm32) and ribosome synthesis (Erbl, Fafl, 
Pols, Rrn6, Ssf2 Efml). This correlates well with studies 
in human cells which show a concentration of proteins 
involved in splicing, translation and protein synthesis 



among those phosphorylated at S/T-Q sites following 
DNA damage [9]. While Tell/Mecl effectors like Dunl 
are known to influence RNA processing, our findings 
suggest that Tell and Mecl are capable of directly regu- 
lating this process. 

Meiosis 

During meiosis, Mecl phosphorylates SCD-containing 
proteins Sae2 and Hopl [67,68]. Similar to Sae2, other 
proteins involved in homologous recombination also 
play roles during normal meiotic progression and thus, 
the presence of an SCD in their sequence may identify 
them as possible Tell/Mecl targets in meiosis. Consist- 
ent with this, the MRX complex, Sgsl and Exol are all 
SCD-containing and are proposed targets of Mecl dur- 
ing normal meiotic progression. It is also possible that 
Mlhl, a mismatch repair SCD protein involved in mei- 
otic recombination, may be also a Mecl target during 
meiosis. Moreover, our yeast SCD census identified, in 
addition to Hopl, other meiotic-specific proteins. 
Examples include Imel, a transcription factor that serves 
as a master regulator of meiosis and triggers entry into 
meiosis in the presence of starvation conditions; MshS 
and Dmcl, proteins involved in processing programmed 
DNA double strand breaks during meiotic recombin- 
ation; and Csml, a kinetochore-localized protein 
required for accurate segregation of homologous chro- 
mosomes in anaphase I [69,70]. 

Nuclear pore 

The significant enrichment of SCD proteins that localize 
to the nuclear pore was surprising. While Rad53 phos- 
phorylates several nuclear pore components, evidence 
for phosphorylation of these by Tell/Mecl, as proposed 
by this census, is lacking. The functional role of nuclear 
pore phosphorylation during the DDR is not fully under- 
stood, but it is known nuclear pore components influ- 
ence DNA repair, gene expression and telomere 
homeostasis which are all pathways directly targeted by 
Tell/Mecl. Alternatively, the presence of importins and 
other transport proteins in our census may indicate a 
direct role of Tell/Mecl in regulating transport across 
the nuclear membrane during the DDR. Consistent with 
this, Losl, an SCD protein which is the primary exon- 
containing tRNA exporter in yeast, is phosphorylated in 
a Mecl- and Rad53-dependent manner during the DDR 
and induces the rapid accumulation of tRNA in the nu- 
cleus and arrest at Gl before START [71]. Therefore, 
the Tell/Mecl kinases couple nucleocytoplasmic traf- 
ficking with cell cycle progression in the presence of 
DNA damage. Our census may have unveiled additional 
novel Tell/Mecl targets that also coordinate protein 
transport across the nuclear pore with other DDR path- 
ways. For instance, Toa2 a TFIIA subunit contains an 
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SCD and is transported into the nucleus by an SCD- 
containing importin (KAP122) while NuplOO and 
Nupll6 bind Mex67, the major mRNA exporter in 
yeast, suggesting Tell/Mecl may also couple nuclear 
transport with gene expression [72,73]. Furthermore 
Kapl23, an SCD protein, imports histones H3 and H4 
into the nucleus, which suggests another possible mech- 
anism by which the Tell/Mecl kinases regulate DNA 
replication and cell cycle progression [74]. Finally, 
Kap95, the major importin of NLS -containing cargo pro- 
teins in yeast, has an SCD which may provide a mechan- 
ism for Tell/Mecl to regulate several nuclear pathways 
by regulating the ability of Kap95 to transport its com- 
ponents [75,76]. 

Telomeres 

Tell promotes the elongation of short telomeres 
[6,77,78]. Although telomeric Cdcl3 protein can be 
phosphorylated by Tell in vitro, it appears not to be a 
Tell target in vivo [60,79]. Tell/s influence on telomeres 
may be due to its effects on DNA end processing by 
proteins that function not only at double strand breaks 
but also at telomeres as previously proposed [60]. Con- 
sistent with this, our yeast census identified several such 
SCD containing proteins (Sae2, Sgsl, Dna2, Srs2, Exol). 
Interestingly, our yeast SCD census also identified two 
additional proteins with roles in telomere homeostasis, 
Tbfl and Rifl. Tbfl functions in parallel with Tell to 
promote preferential elongation of shorter telomeres 
[80]. One of the S/T-Q sites in Rifl is phosphorylated 
in vivo [12] and it has been proposed that Tell phos- 
phorylation of Rifl may serve to relieve Rifl negative in- 
hibition of telomerase, downstream of telomerase 



recruitment [81]. Thus, Tell's role in telomere length 
homeostasis is likely complex. Moreover, several SCD 
proteins are required for establishing heterochromatin at 
sub telomeric regions (Sir4, Rifl and Tbfl) further 
expanding putative roles of Tell at telomeres. 

Cell signaling 

While phosphoproteome analysis revealed several puta- 
tive Tell/Mecl targets that localize exclusively to the 
cytoplasm, the presence of Tell/Mecl in cellular com- 
partments other than the nucleus remains to be demon- 
strated. In contrast, human ATM/ATR localize, in part, 
in the cytoplasm where they function in endocytosis and 
several cell signaling pathways. For instance, ATM plays 
roles in NF-kB signaling where, upon DNA damage, 
ATM binds and phosphorylates NEMO and translocates 
to the cytoplasm [19]. Consistent with this, our human 
SCD census identified protein networks involved in NF- 
kB and other cell signaling pathways like the ERK, insu- 
lin, JNK, RAS and AKT. Also in humans, ATM is known 
to induce autophagy in the presence of reactive oxygen 
species by repressing the TORC pathway. In yeast, the 
TORC pathway elicits a response to nutrient deprivation 
and metabolic stress, by inducing transcriptional activa- 
tion of metabolic genes, repressing protein synthesis and 
inducting autophagy [82]. Our yeast SCD census 
revealed a number of proteins involved in the TORC 
pathway and other nutrient signaling mechanisms, sug- 
gesting in yeast, as in human cells, Tell/Mecl may regu- 
late the TORC pathway (Figure 6A). Similarly, we found 
the signal transduction pathway controlling glycerol pro- 
duction in response to hyperosmotic stress is highly 
enriched in SCD-containing proteins suggesting Tell/ 
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Mecl may contribute to the response to this type of 
stress as well (Figure 6B) [83-85]. 

Endocytosis 

Although ATM has been found in endocytic vesicles, its 
precise role in endocytosis remains to be determined. 
Surprisingly, our yeast SCD census was significantly 
enriched for proteins involved in endocytosis, indicating 
that Tell /Mecl may also be involved in endocytosis in 
yeast. Moreover, it is known that the actin cytoskeleton 
and several motor proteins are involved in transporting 
of endocytic vesicles across the cytoplasm [86]. There- 
fore, the presence of SCDs in proteins involved in cor- 
tical cytoskeleton may reflect their role in endocytosis in 
addition to their involvement in telophase. 

Conclusions 

Overall, we have shown that our newly defined SCD def- 
inition can be used to predict pathways under control of 
Tell/Mecl and to identify novel putative targets for 
these kinases. A census of SCD-containing proteins in 
yeast has revealed a wide network of proteins involved 
in cytokinesis, mRNA processing, protein transport, 
mating type switching and endocytosis suggesting that 
Tell/Mecl roles in yeast are broader than previously 
recognized and contain extensive parallels to pathways 
and targets under control of ATM/ATR in mammalian 
cells. 

Methods 

Identification of proteins based on SCD definitions 

We built a bioinformatics pipeline to systematically 
analyze a range of SCD definitions, where SCD defini- 
tions were defined by a maximum length (Y) and a mini- 
mum required number of S/T-Qs. We started with a 
maximum SCD length of 100 amino acids based on the 
original SCD defintion and iteratively decreased this 
maximum length by increments of 5 amino acids during 
each iteration until the minimum length was 50. We also 
iteratively adjusted the required number of S/T-Qs from 
5 down to 3. We integrated ScanProsite (http://ca. 
expasy.org/tools/scanprosite/) into our bioinformatic 
pipeline to identify matching proteins. An example 
query used for the SCD definition was [ST]Q-X(0,Y)-[ST] 
Q-X(OX)-[ST]-Q for 3 S/T-Qs. The UniProtKB/Swiss- 
Prot and splice variants database under the Saccharo- 
myces cerevisiae and the Homo sapiens taxonomy filters 
were used as source databases for making these identifi- 
cations. The resulting lists were then filtered based on 
the length of the match sequence, as specified by the 
SCD definition of each iteration. 



Categorization of the sequence-matched proteins 

These proteins were then systematically annotated for 
GO (Gene Ontology) keywords, amino-acid sequences, 
and known phosphorylation sites using Uniprot web ser- 
vices (http://www.ebi.ac.uk/ego/GAnnotation). The pro- 
teins were also manually annotated as having an SCD 
based on a literature review. At this point, we could 
characterize the proteins in our lists as having known 
SCDs or known phosphorylation sites (or both). Gene 
function descriptions in Table 1 and Table 2 were par- 
tially extracted from www.yeastgenome.org. 

Characterization of the protein phosphorylation sites 

For proteins with known phosphorylation site(s), we 
aligned their amino acid sequences to characterize the 
flanking amino acids. We calculated the relative frequen- 
cies of amino acids in the for positions +5 and -5 of the 
phosphorylation sites, and generated images to show the 
results (as shown in Figure ID). 

Statistics 

The expected number of proteins containing a SCD do- 
main can be calculated by modeling each protein / as a 
Bernoulli random variable. The random variable is 
defined with probability pi where pi is defined as the 
probability of the event that an SCD occurs in protein / 
with length Li. The sum of the probabilities over all the 
proteins in the yeast genome is the expected number of 
proteins containing a SCD [87]. We estimate each prob- 
ability Pi using a Poisson process N(t) with rate param- 
eter X, where N(t) is defined as the number of S-T/Q 
sites occurring up to amino acid position t We esti- 
mated \ by calculating the rate of S/T-Q sites per pro- 
tein and then dividing X by the length of the protein to 
obtain a rate of S/T-Q sites per amino acid specific for 
each protein. Next, we defined the SCD event as at least 
three S/T-Q di-motifs occurring within a given stretch 
amino acids. The probability of this event follows a Pois- 
son process N(t) with rate parameter X defined in terms 
of S/T-Q sites per amino acid for each protein. The sum 
of the probabilities pi over all the sequences is the 
expected number of sequences containing an SCD. For 
instance, the expected number of SCD-containing proteins 
for 3 di-motifs within a stretch of 50 amino acids is 147. 

For comparing gene lists from our census with pub- 
lished, experimental data, we used the hypergeometric 
distribution to test for significance in the overlap be- 
tween the two gene lists. 

Gene ontology analyses 

To identify GO-Slim terms over-represented in the yeast 
SCD census we ran the genes encoding the census pro- 
teins through GOStat (http://gostat.wehi.edu.au/cgi-bin/ 
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goStat.pl) using the Saccharomyces Genome Database 
(http://www.yeastgenome.org/) with a maximum p-vdlue 
of 0.01 and a minimum number of gene products of 2. 
We then used TermFinder (http://go.princeton.edu/cgi- 
bin/GOTermFinder) to identify enriched GO terms be- 
yond those in GO-SUm with a value cutoff of 0.01 
(Bonferroni correction for p-vdlue was applied. The false 
discovery rate was calculated). TermFinder was also used 
to identify enrichment of ontology terms in the human 
SCD census applying the same parameters as in the 
yeast search but using GOA-Human (http://www.ebi.ac. 
uk/GOA/) as the database. 
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